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Physics Analysis Expert PAX: First Applications 
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PAX (Physics Analysis Expert) is a novel, C++ based toolkit designed to assist teams in particle physics 
data analysis issues. The core of PAX are event interpretation containers, holding relevant information about 
and possible interpretations of a physics event. Providing this new level of abstraction beyond the results of 
the detector reconstruction programs, PAX facilitates the buildup and use of modern analysis factories. Class 
structure and user command syntax of PAX are set up to support expert teams as well as newcomers in preparing 
for the challenges expected to arise in the data analysis at future hadron colliders. 



1. Motivation 



Working directly on the output of detector reconstruc- 
tion programs when performing data analyses is an 
established habit amongst particle physicists. Nev- 
ertheless, at the experiments of HERA and LEP it 
turned out to be an advantage to have uniform access 
to calorimeter energy depositions, tracks, electrons, 
muons etc. which requires a new level of abstraction on 
top of the reconstruction layer. Examples of physics 
analysis packages providing this level are H1PHAN 1 
and ALPHA 2 of the HI and ALEPH experiments. No- 
ticed effects were, amongst others, that 

a) users could relatively quickly answer physics ques- 
tions, 

b) the physics analysis code was protected against 
changes in the detector reconstruction layer, 

c) and finally the management liked the fact, that the 
relevant reconstruction output had been used in the 
analysis. 

While previous programs were designed to provide 
a rather complete view of the event originating from a 
single e + e~ or ep scattering, a next generation pack- 
age is challenged by hadron collisions with 0(20) si- 
multaneous events. This implies a large number of 
possible interpretations of the triggered events and 
sometimes requires the analysis of dedicated regions 
of interest. The "Physics Analysis Expert" toolkit 
(PAX) is a data analysis utility designed to assist 
physicists in these tasks in the phase between detector 
reconstruction and physics interpretation of an event 
(FigHJ. The alpha- version of PAX was presented at 
the HCP2002 conference jl|. In this contribution we 
introduce the beta-version. 



2. Guidelines for the PAX Design 

The design of the next generation physics analysis util- 
ity PAX has been developed according to the guide- 
lines listed below: 

1. The package is a utility tool box in a sense that 
the user has full control of every step in the pro- 
gram execution. 

2. The programming interface should be as sim- 
ple and intuitive as possible. This minimizes 
the need to access the manual and thereby in- 
creases the acceptance in the community. Fur- 
thermore, simplicity enables also physicists with 
limited time budget or limited knowledge of ob- 
ject oriented programming to carry out complex 
physics analyses. 

3. The package supports modular physics analysis 
structures and thus facilitates team work. The 
complexity of todays and future analyses makes 
team work of many physicists mandatory. 

4. Existing physics analysis utilities can be con- 
nected. Examples are tools for fourvector gym- 
nastics which are available in general form (e.g. 
in the CLHEP library 3 ), other examples are his- 
tograms, fitting routines etc. 

5. The physics analysis package can be used con- 
sistently among different high energy physics ex- 
periments. 



6. Many use cases are to be taken care of. 
following list is certainly not complete: 



The 



1 H1 Collaboration, internal software manual for H1PHAN 

2 ALEPH Collaboration, "ALPHA" internal note 99-087 3 CLHEP, A Class Library for H igh Energy Physics, 

SOFTWR 99-001 \hUp://proj-clhep.web.cern.ch/proj-clhep/\ 
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(a) Access to the original data of the ex- 
periment is possible at each stage of the 
physics analysis. This enables detailed con- 
trol of all analysis steps, and access to 
experiment-dependent information and re- 
lated methods. 

(b) When studying events of a Monte Carlo 
generator, relations between generated and 
reconstructed observables are accessible at 
any stage of the analysis. This allows the 
quality of the detector reconstruction to be 
studied in detail. 

(c) Without significant code changes, a com- 
plete analysis chain can be tested with dif- 
ferent input objects such as reconstructed 
tracks, generated particles, fast simulated 
particles etc. 

(d) Relations between reconstructed physics 
objects (tracks, muons, etc.) and vertices 
are available, as well as hooks for separat- 
ing multiple interactions. 

(e) The decay chains with secondary, tertiary 
etc. vertices can be handled in events with 
multiple interactions. 

(f) Information of different objects can be 
combined, e.g., tracks and calorimeter in- 
formation. 

(g) A common challenge in data analysis are 
reconstruction ambiguities which need to 
be handled. Administration of these ambi- 
guities is supported. 

(h) The user finds assistance in developing 
analysis factories with multiple physics 
data analyses carried out simultaneously. 

To cope with these challenges, the advantage of using 
an object oriented language is obvious. For the conve- 
nience of connecting to other packages, C++ was the 
language of choice for the realisation of PAX. 
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Figure 2: The basic unit in PAX: the event 
interpretation together with the classes for collisions, 
vertices, fourvectors, and user records. 



3. PAX Class Structure Implementation 
3.1. Event Interpretation 

The basic unit in PAX is a view of the event which 
we call "event interpretation". The event interpreta- 
tion is a container used to store the relevant infor- 
mation about the event in terms of collisions, ver- 
tices, fourvectors, their relations, and additional val- 
ues needed in the user's analysis. The correspond- 
ing class is denoted PaxEventlnterpret (FigEJ). The 
user books, fills, draws, copies, advances the event in- 
terpretation, and has the ultimate responsibility for 
deleting it. When the user finally deletes an instance 
of PaxEventlnterpret, instances of objects which have 
been registered with this event interpretation - col- 
lisions, vertices, fourvectors, etc. - are also removed 
from memory. 

A copy of an event interpretation is a physical copy 
in memory. It is generated preserving all values of 
the original event interpretation, and with all relations 
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between collisions, vertices, and fourvectors corrected 
to remain within the copied event interpretation. In 
addition, the histories of the individual collisions, ver- 
tices, and fourvectors are recorded. The copy func- 
tionality simplifies producing several similar event in- 
terpretations. This is advantageous typically in the 
case of many possible views of the event that differ in 
a few aspects only. Although the event interpretations 
do not know of each others existence, recording the 
analysis history of collisions, vertices and fourvectors 
requires intermediate event interpretations to exist. 
Therefore, we recommend to delete all event interpre- 
tations together after an event has been analysed. 

Besides the features mentioned, the PaxEventlnter- 
pret also defines an interface to algorithms such as jet 
algorithms, missing transverse energy calculations etc. 
This eases the exchange of algorithms within, or be- 
tween analysis teams. 

3.2. Physics Quantities 

PAX supports three physics quantities: collisions, ver- 
tices, and fourvectors. Three classes have been defined 
correspondingly. The class PaxCollision provides the 
hooks to handle multi-collision events (Fig|2J). Ver- 
tices and fourvectors are defined through the classes 
PaxVertex, and PaxFourVector. Since the user may 
need to impose vector operations on them, both PAX 
classes inherit from the CLHEP classes Hep3 Vector, 
and Hep Lorentz Vector, respectively. Their function- 
alities are available to the user. The Pax Vert ex and 
PaxFourVector classes contain additional functional- 
ity which mainly result from features proven to be 
useful in the previously mentioned H1PHAN package. 

For all physics quantities the user can store addi- 
tional values (data type double) needed by the anal- 
ysis via the class PaxUser Record. These values are 
registered together with a key (data type string func- 
tioning as a name), which must be given by the user. 

3.3. Relation Management for the 
Physics Quantities 

Relations between collisions, vertices, and fourvectors 
are handled by a separate class called PaxRelation- 
Manager (Fig[3J). Here we followed the design pattern 
Mediator Examples for relations to be handled 
between physics quantities are fourvectors which orig- 
inate from the primary vertex, an incoming fourvector 
to a secondary vertex, or connections between multi- 
ple collisions and their vertices etc. 

When physics quantities are copied, the copied in- 
stance carries a pointer to the previous instance. An 
example would be a fourvector which is copied to- 
gether with an event interpretation. In this way, the 
full history of the fourvector is kept throughout the 
analysis. 
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Figure 3: The relations in PAX enable storage of decay 
trees, records of analysis history, access to experiment 
specific classes, and exclusion of parts of the event from 
the analysis. 



The relation manager also allows parts of the event 
to be excluded from the analysis. An example would 
be a lepton which needs to be preserved while apply- 
ing a jet algorithm. This locking mechanism is build 
in the form of a tree structure which enables sophisti- 
cated exclusion of unwanted event parts. For example, 
locking a collision excludes the vertices and fourvec- 
tors connected to this collision (Fig^). In the case of 
locking a secondary vertex, PAX will lock the decay 
tree starting at this vertex. 



analyse fourvectors 

'excl 




exclude collision 



Figure 4: Excluding a collision (left) or a vertex (right) 
from an analysis using the lock mechanism excludes all 
vertices and fourvectors originating from the excluded 
object. 

Note that when locking a fourvector /, all fourvec- 
tors related to / through its history record are locked 
as well. The same functionality applies to collisions 
and vertices. Unlocking the fourvector / removes the 
lock flag of all physics quantities related to / through 
the decay tree, or the history record at the time the 
unlock command is executed. 

For some applications, the user may want to inquire 
additional information on a physics quantity which is 
only contained in an experiment specific class. An ex- 
ample is a PaxFourVector instance originating from a 
track of which the user wants to check the x 2 proba- 
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bility of the track fit. The relation manager allows to 
register instances of experiment specific classes which 
led to a PaxCollision, Pax Vertex, or PaxFourVector 
instance. To enable such relations, a template class 
PaxExperiment< > has been defined which allows reg- 
istration of arbitrary class instances. Applying the 
C++ dynamic-cast operator, the user can recover the 
original instance, and access all member functions of 
the experiment specific class. 

3.4. Container and Iterator 

PAX uses the template class rnapO from the Stan- 
dard Template Library (STL) 3 to manage pairs of 
keys and items in a container. The user record in 
Fig|2 is an example of such a container for pairs of 
data type string and double. For accessing a certain 
item, optimized STL algorithms search the map for 
the corresponding key and provide access to the item. 

All PaxCollision, PaxVertex, and PaxFourVector 
instances carry a unique identifier of type Paxld which 
is used as the key in the PaxCollisionMap, PaxVer- 
texMap, and PaxFourVector Map of an event interpre- 
tation (Fig|2J). Pointers to the collision, vertex, and 
fourvector instances are the corresponding items. In 
this way, fast and uniform access to the individual 
physics quantities is guaranteed. 

For users not familiar with STL iterators, we pro- 
vide the Paxlterator class which gives a simple and 
unified command syntax for accessing all containers 
in PAX. 

3.5. Documentation 

The PAX user guide is available on the web 0|. In 
addition to a paper version of the manual, we provide 
a fast navigator web page which can be used as a 
reference guide. 



4. Application within Physics Analysis of 
the CDF Experiment 

The PAX package is explored by the Karlsruhe CDF 
group in top quark analyses. Example Feynman dia- 
grams of signal and background processes relevant to 
top analysis in the so-called electron plus jet channel 
are shown in Fig|5J 

4.1. Combining Results of the Detector 
Reconstruction Program 

The CDF detector reconstruction program provides 
already excellent reconstruction algorithms for the 
calorimeter, track finding, electron identification etc. 




Figure 5: Example Feynman diagrams relevant to top 
quark analysis in proton-antiproton collisions. 



To further advance these results we fill them into in- 
stances of the PaxEventlnterpret class (FigEJ). Ex- 
amples are the calorimeter energy measurements, the 
tracking output, jet searches, and electron, muon, and 
photon identification. For the graphical representa- 
tion of the different event interpretations in FigElwe 
use the ROOT package 6J. The lines indicate the di- 
rection of the fourvectors, and the lengths correspond 
to their energies. In order to optimize the energy mea- 
surements and take into account the advanced parti- 
cle identification algorithms of the reconstruction soft- 
ware, we combine different results into a single event 
interpretation. 

While combining the calorimeter output with the 
electrons is relatively straight forward, an algorithm to 
combine the calorimeter and track information needs 
to treat the energy which is measured in both sub- 
detectors in order to avoid double counting of energy. 
In FigEJthe quality of our combined energy measure- 
ment is tested. Using tt events of the Herwig Monte 
Carlo generator pj , the histograms vertical axis shows 
the reconstructed transverse energy sum as a function 
of the true transverse energy sum. The latter was 
determined from the generated hadrons and leptons, 
excluding neutrinos. The algorithm provides a good 
measurement of the event total transverse energy. 

4.2. Top Quark Analysis Factory 

To optimize separation of signal from background 
processes, we set up an "analysis factory" based on 
the PAX event interpretation concept. In the top 
quark factory, every event is examined with respect 
to different processes which include electroweak and 
strong production of top quarks, as well as W- and 
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Figure 6: Combining different output of the detector 
reconstruction program results in a good reconstruction 
quality of the transverse energy in tt events of the Herwig 
Monte Carlo generator. 

Z-production (FigjSJ). We attempt a complete recon- 
struction of the partonic scattering process. 

Some aspects of the event analysis of a tt event are 
demonstrated in Figd The first picture shows the 
situation after applying a jet finding algorithm to the 
combined reconstruction information shown in FigEl 
The electron candidate has been preserved using the 
locking mechanism. The lines indicate the fourvectors 
of the 4 jets, the electron, and one fourvector which 
includes all remaining unclustered energy depositions. 

In the second row, a W-boson decaying into an elec- 
tron and a neutrino is reconstructed. From the W- 




t — > bieui t — > 62^1 




Figure 7: Full reconstruction of the partonic scattering 
process of a Herwig tt event. 



mass constraint two possible solutions can be deduced 
for the longitudinal neutrino momentum which cor- 
respond to two W event interpretations. Combining 
the W with one of the jets leads to top quark solu- 
tions, two of which are shown in the bottom row of 
Figd The reconstructed top quark candidates point 
into different directions. In this four jet event, 24 tt 
solutions can be constructed. Although the number of 
remaining fourvectors is relatively small, the full in- 
formation of the original 0(1000) calorimeter energy 
depositions, tracks etc. can still be accessed from each 
of the event interpretations. 

We select the most likely tt event interpretation by 
first demanding a non-zero bottom quark probability 
for one of the jets of the top candidate. FiglHK shows 
that the number of remaining event interpretations 
is still relatively large. We select one of these solu- 
tions by using a simple x 2 test on the reconstructed 
W-boson and top quark masses. In FiglSfr, the recon- 
structed mass of the top quark in the electron plus 
jet decay channel is shown for all events (histogram). 
The symbols indicate the number of events in which 
the correct top quark candidate was found. 
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Figure 8: a) Multiplicity of event interpretations, and 
b) reconstructed mass of the top quark decaying into a 
bottom quark and a W-boson which subsequently decays 
into an electron and a neutrino. The symbols represent 
the events where the selected event interpretation was 
the correct one. 



5. Application within Physics Analysis of 
the CMS Experiment 



The Karlsruhe CMS group uses PAX within Higgs 
search studies. The applications shown here are 
benchmark tests where the Higgs boson decays into 
ZZ* with subsequent decays into four muons (FigEJ). 
We simulated this process with the Monte Carlo gen- 
erator PYTHIA assuming a hypothetical Higgs 
mass of vfiH = 130 GeV. As background process we 
considered Zbb production, which we generated using 
COMPHEP followed by the LUND string fragmen- 
tation model within PYTHIA. 

Whereas in the Zbb process the muons result from 
the Z boson and two bottom quark jets, in Higgs 
events the four muons are the decay products of the Z 
and Z*. Thus, to reconstruct the Higgs, all muons of a 
generated event are filled into an event interpretation 
and, with the help of a likelihood method, a Z and 
a Z* are reconstructed. Combining of Z and Z* then 
results in the Higgs mass spectrum, shown in FiglTUk. 

The same analysis has been used in a full simulation 
study, where the detector response was simulated with 



p 
p 




H TV 




Figure 9: Example Feynman diagrams of Higgs 
production and the Zbb background process. 
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Figure 10: Reconstructed Higgs mass for an integrated 
luminosity of 20 fb _1 using a) generated and 
b) reconstructed muons of generated Higgs signal events 
and Zbb background events. 



CMSIM 4 and the muons reconstructed with the CMS 
reconstruction software ORCA 5 . As shown in FigFTUb. 
the quality of the reconstructed Higgs mass spectrum 
is still satisfactory. Please note that both results - 
based on generated and reconstructed muons - were 
obtained by using the identical analysis code. This 
is easily possible due to the new level of abstraction 
which is provided by the PAX toolkit. 



4 CMSIM - CMS Simulation and Reconstruction Package, 
http:/ / cmsdoc. cern. ch/ cmsim/ cmsim. html 

°ORCA - Object-oriented Reconstruction for CMS Analysis, 
\http: //cmsdoc. cern. ch/orca/\ 
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6. Conclusion 

The experiences gained at HERA and LEP show the 
advantages of performing physics analyses not directly 
on the output of detector reconstruction software, but 
using a new level of abstraction with uniform access 
to reconstructed objects. This level is provided by the 
presented data analysis toolkit PAX (Physics Analysis 
Expert). The design of PAX was guided by the experi- 
ence of earlier experiments together with the demands 
arising in data analyses at future hadron colliders. Im- 
plemented in the C++ programming language, PAX 
provides a simple and intuitive programming interface 
to work within experiment specific C++ frameworks, 
but also on Monte Carlo generators. Event interpre- 
tation containers hold the relevant information about 
the event in terms of collisions, vertices, fourvectors, 
their relations, and additional values needed in the 
analysis. This enables the user to keep different inter- 
pretations of one event simultaneously and advance 
these in various directions. As PAX supports mod- 
ular analysis and even the buildup of analysis fac- 
tories, it facilitates team work of many physicists. 
PAX is suited for expert teams, physicists with limited 
time budget for data analyses, as well as newcomers. 
Groups within the experiments CDF and CMS are 
using PAX successfully for their analyses. 
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